---
title: "Replication file for Political Preferences, Policy Loss and Shifting Support for European Integration"
author: "Simon Hix and Bjørn Høyland"
format: pdf
editor: visual
execute:
  eval: false
---

## Replication material

In this document, we describe the steps used to collect, prepare, analyse the data and visualize the results.

## Data

We rely on two data-sources for our analysis:

1.  Eurobarometer surveys (1976 - 2023)

2.  EU legislation (regulations and directives) from EurLex (1953 - 2024)

### Eurobarometer

For the Eurobarometer, we merged all surveys that contained data on support for the EU and left-right self-placement. All of these surveys also contained information on age and education. More information about Eurobarometer can be found at the [Eurobarometer](https://europa.eu/eurobarometer/about/eurobarometer) webpage. The code below reproduces Figure 1.

```{r, render = TRUE, results="asis", eval=TRUE, warning=FALSE, message=FALSE}
source("eb_support.R")

```

### EurLex data

We interacted with the EurLex API via `library(EurLex)` to collect all EU directives and regulations. We stored the directives and regulations in folders, clean them, including dropping all Commission legislation and stored them as a data-frame called `legisdata.RData`. In the interest of space, we only attach the resulting data-frame.

```{r, render=FALSE, eval = FALSE, warning=FALSE, message=FALSE}
source("collect_legislation.R")
collect_legislation("directive")
collect_legislation("regulation")
source("legislation.R")
```

## Measuring EU policy-output

Next, we calculated our 7 left-right measures:

1.  Keywords

2.  CMPgen

3.  CMPecon

4.  CMPsocial

5.  LLM

6.  Factor

The next set of scripts adds each of these measures as columns to the `legisdata.Rdata` dataset.

```{r, render = FALSE, eval = FALSE, warning=FALSE, message=FALSE}
library(data.table)
load("legisdata.RData")
source("keywords.R") 
source("manifesto.R")
source("llama.R")
source("factor.R")
save(legis, file ="all_years_legis.RData")
```

### Annualized EU-policy-output

Having build the legislation-level dataset, we now aggregate the scores by year. As the LLM measure is categorical, we use the logit transformation at the aggregated level. We then merge the legislative measures with the Eurobarometer-data, and store this as `hh_data.RData`.

```{r, render =TRUE, eval=TRUE, warning=FALSE, message=FALSE}
source("prepare_analysis.R")
```

#### Examples

```{r, render = TRUE, results='hide', eval = TRUE, warning=FALSE, message=FALSE}
examples <- legis[celex %in% c("32008L0115", "32003L0088", "32004L0025", "32006L0123"),
                  list(title, keywords,keywords_score, CMPgen, llm_class, CMPecon, 
                       CMPsocial, Factor, keywords_mean, gen_mean,econ_mean, 
                       social_mean, Factor_mean),]
# mean-centering examples
examples[,Keywords := round(keywords_score - keywords_mean, 2)]
examples[, CMPgen := round(CMPgen - gen_mean,2)]
examples[, CMPecon := round(CMPecon - econ_mean,2)]
examples[, CMPsocial := round(CMPsocial - social_mean,2)]
examples[, Factor := round(Factor - Factor_mean,2)]
examples <- examples |> 
  rename(LLM = llm_class) |>
  rename(Title = title) |> 
  select(Title, Keywords, CMPgen, CMPecon, CMPsocial, LLM, Factor) 
x <-xtable(examples, align = c("R{.5cm}","p{5cm}", "p{1.3cm}",
                               "p{1.2cm}",  "p{1.3cm}", 
                               "p{1.4cm}", " p{1cm}","p{1cm}"),
           caption = "Coding examples",label = "tab:examples")
print(x, floating = TRUE, include.rownames = FALSE, size = "footnotesize",
      file = "../tables/example.tex")
```

#### Descriptive data

```{r, eval=TRUE, warning=FALSE, message=FALSE}
hh_data |> 
  na.omit() |> 
  ggplot(aes(x = policy_loss_Factor, y = proeu)) +
  geom_smooth(method = "lm")+
  theme_minimal()+
  xlab("Policy loss")+
  ylab("Probablity of supporting the EU") +
  theme(legend.title=element_blank())
ggsave("figures/policy_loss_eu.png",
       width = 6, height = 2.65, units = "in", dpi = 450)
```

```{r, eval = TRUE, warning=FALSE, message=FALSE}
####
labs <- c("EU Support", "Policy-loss (Keywords)", "Policy-loss (CMPgen)", 
          "Policy-loss (CMPecon)", "Policy-loss (CMPsocial)", "Policy-loss (LLM)", "Policy-loss (Factor)",
          "Left-right", "Age", "Age squared", "Education", "Sex", "Identity")
hh_data |> 
  select(proeu,policy_loss_Keywords, policy_loss_CMPgen, policy_loss_CMPecon, 
         policy_loss_CMPsocial, policy_loss_LLM, policy_loss_Factor,
         left_right,  age, age_sq, education,sex, identity) |> 
vtable::st(out = "latex", labels = labs, title = "Descriptive statistics", 
           anchor = "tab:desc_summary")
```

#### Analysis

We are now ready to analyse the data.

```{r, render = TRUE, eval = TRUE, warning=FALSE, message=FALSE}
library(modelsummary)
library(marginaleffects)
library(tidyverse)
library(data.table)
library(ggplot2)
library(ggthemes); library(ggExtra); library(broom)
library(fixest);library(ggeffects)
load("hh_data.RData")
source("helpers.R")
dir.create("figures")
dir.create("tables")
## average policy loss over time

factor <- 1.5


### Figure 3
hh_data |> 
ggplot(aes(x=year)) +
  geom_smooth(aes(y=policy_loss_Factor), method = "gam", 
              formula = y ~ s(x, k = 12, bs = "cs"), col="blue") +
  geom_smooth(aes(y=proeu*factor), method = "gam", 
              formula = y ~ s(x, k = 12, bs = "cs"), col="red") +
  scale_y_continuous(name = "Policy loss \n (blue)", 
                     sec.axis = sec_axis(~ . / factor,
                                         name = "Support for the EU \n (red)")) +
  theme_minimal() +
  xlab("")

ggsave("figures/policy_and_eu_support.png",
       width = 6, height = 4, units = "in", dpi = 450)

source("analysis.R")
```

All resulting tables and figures can be found in the tables and figures sub-folders generated by running this script.

```{r, render = TRUE, eval = TRUE}
sessionInfo()
```
